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As the issue rate and depth of pipelining of high performance Superscalar processors 
increase, the importance of an excellent branch predictor becomes more vital to delivering 
the potential performance of a wide-issue, deep pipelined microarchitecture. We propose a 
new dynamic branch predictor (Two-Level Adaptive Branch Prediction) that achieves 
substantially higher accuracy than any other scheme reported in the literature. The 
mechanism uses two levels of branch history information to make ... 
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Recent attention to speculative execution as a mechanism for increasing performance of 
single instruction streams has demanded substantially better branch prediction than what 
has been previously available. We [1,2] and Pan, So, and Rahmen [4] have both proposed 
variations of the same aggressive dynamic branch predictor for handling those needs. We 
call the basic model Two-Level Adaptive Branch Prediction; Pan, So, and Rahmeh call it 
Correlation Branch Prediction. In this paper, we adopt th ... 
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In this paper, we examine the benefits of the early resolution of branch instructions and the 
impact of unresolved branches on history-based branch prediction schemes by using two 
new metrics that are more revealing than branch prediction accuracy alone. We first briefly 
review a number of branch prediction schemes and introduce two new branch prediction 
scheme performance metrics. We then utilize these metrics to gauge the improvement in 
branch prediction scheme performance when only the outcom ... 
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The need for multiple branch prediction is inherent to wide instruction fetching. This paper 
presents a completion time multiple branch predictor called the Tree-based Multiple Branch 
Predictor (TMP) that builds on previous single branch prediction techniques. It employs a 
tree structure of branch predictors, or tree-node predictors, and achieves accurate multiple 
branch prediction by leveraging the high accuracies of the individual branch predictors. A 
highly-efficient TMP design u ... 
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Value prediction attempts to eliminate true-data dependencies by dynamically predicting 
the outcome values of instructions and executing true-data dependent instructions based on 
that prediction. In this paper we attempt to understand the limitations of using this 
paradigm in realistic machines. We show that the instruction-fetch bandwidth and the issue 
rate have a very significant impact on the efficiency of value prediction. In addition, we 
study how recent techniques to improve the instructio ... 
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Most large shared-memory multiprocessors use directory protocols to keep per-processor 
caches coherent. Some memory references in such systems, however, suffer long latencies 
for misses to remotely-cached blocks. To ameliorate this latency, researchers have 
augmented standard coherence protocols with optimizations for specific sharing patterns, 
such as read-modify-write, producer-consumer, and migratory sharing. This paper seeks to 
replace these directed solutions with general prediction logic t ... 
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Data compression and prediction are closely related. Thus prediction methods based on 
data compression algorithms have been suggested for the branch prediction problem. In 
this work we consider two universal compression algorithms: prediction by partial matching 
(PPM), and a recently developed method, context tree weighting (CTW). We describe the 
prediction algorithms induced by these methods. We also suggest adaptive algorithms — 
variations of the basic methods that attempt to fit limited mem ... 
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Full text available: pdfd .41 MB) HP Additional Information: full citation , abstract , references , citings , index 
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This paper explores the possibility of using program profiling to enhance the efficiency of 
value prediction. Value prediction attempts to eliminate true-data dependencies by 
predicting the outcome values of instructions at run-time and executing true-data 
dependent instructions based on that prediction. So far, all published papers in this area 
have examined hardware-only value prediction mechanisms. In order to enhance the 
efficiency of value prediction, it is proposed to employ program profil ... 
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Multiple issue of instructions occurs in superscalar and VLIW machines. This paper 
investigates a third type of machine design, which combines the advantages of code 
compatibility as in superscalars and the absence of complex dependency-checking logic 
from the decoder as in VLIW. In this design, a stream of scalar instructions is executed by 
the hardware and is simultaneously compacted into VLIW-type instructions, which are then 
stored in a structure called a shadow cache. When a shadow cac ... 
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In a 64-bit processor, many of the data values actually used in computations require much 
narrower data-widths. In this study, we demonstrate that instruction data-widths exhibit 
very strong temporal locality and describe mechanisms to accurately predict data-widths. To 
exploit the predictability of data-widths, we propose a Multi-Bit-Width (MBW) 
microarchitecture which, when the opportunity arises, takes the wires normally used to 
route the operands and bypass the result of a 64-bit instruction, ... 
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Pipeline stalls due to conditional branches represent one of the most significant 
impediments to realizing the performance potential of deeply pipelined, superscalar 
processors. Many branch predictors have been proposed to help alleviate this problem, 
including the Two-Level Adaptive Branch Predictor, and more recently, two-component 
hybrid branch predictors. In a less idealized environment, such as a time-shared system, 
code of interest involves context switches. Context switches, even at fairly ... 
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Previous branch prediction studies have relied primarily upon the SPECint89 and SPECint92 
benchmarks for evaluation. Most of these benchmarks exercise a very small amount of 
code. As a consequence, the resources required by these schemes for accurate predictions 
of larger programs have not been clear. Moreover, many of these studies have simulated a 
very limited number of configurations. Here we report on simulations of a variety of branch 
prediction schemes using a set of relatively large bench ... 
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